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Olson's conditional-logistic model retains the nice property of the LOD score formulation 
and has advantages over other methods that make it an appropriate choice for complex 
trait linkage mapping. However, the asymptotic distribution of the conditional-logistic 
likelihood-ratio (CL-LR) statistic with genetic constraints on the model parameters is 
unknown for some analysis models, even in the case of samples comprising only 
independent sib pairs. We derive approximations to the asymptotic null distributions of 
the CL-LR statistics and compare them with the empirical null distributions by simulation 
using independent affected sib pairs. Generally, the empirical null distributions of the 
CL-LR statistics match well the known or approximated asymptotic distributions for all 
analysis models considered except for the covariate model with a minimum-adjusted 
binary covariate. This work will provide useful guidelines for linkage analysis of real data 
sets for the genetic analysis of complex traits, thereby contributing to the identification of 
genes for disease traits. 

Keywords: linkage analysis, affected sib pairs, identity-by-descent, conditional-logistic model, genetic constraints, 
null distribution, likelihood-ratio statistics 



INTRODUCTION 

In the study of human data by genetic linkage analysis, the tradi- 
tional LOD score method, also called a "parametric" or "model- 
based" method because it requires information about an assumed 
genetic model, is efficient for single-gene Mendelian traits but is 
much less well suited for the analysis of traits with complex non- 
Mendelian modes of inheritance. In the absence of a well-defined 
disease inheritance model, alternative robust "non-parametric," 
"weakly-parametric" or "model-free" linkage methods, which do 
not require the specification of a disease model, have been used 
for deciphering the genetic basis of complex traits. 

One such approach that has been extremely useful in the anal- 
ysis of human genetic diseases is the affected sib pair (ASP) study 
design, as in tests based on the mean proportion of identity-by- 
descent (IBD) sharing (Blackwelder and Elston, 1985) or tests 
based on the likelihood-ratio (LR) defined by Risch (1990a,b) 
that uses the same one-parameter model to analyze ASPs or any 
other affected unilineal relative pairs by producing a LOD score. 
Holmans (1993) extended Risch's maximum LOD score method 
into a two-parameter model for ASPs, but with the genetic con- 
straints required for single locus Mendelian inheritance; here 
we call this the Risch and Holmans (RH) model. Olson (1999) 
proposed a general conditional-logistic (CL) model that com- 
bines several extensions and modifications (Cordell et al., 1995; 
Rogus and Krolewski, 1996; Greenwood and Bull, 1997, 1999; 
Olson, 1997; Lunetta and Rogus, 1998) into a unified framework: 
the likelihood is conditioned on sampling affected relative pairs 
(ARPs) and the parameterization is done in terms of the log- 
arithm of allele sharing specific relative risks, instead of allele 
sharing probabilities as in the RH model. The CL model not only 
retains the "nice" property of the LOD score formulation of the 



RH model, i.e., it is additive over independent sets of data, but it 
also has advantages over the RH model. It is valid for any type of 
ARPs with the same allele sharing specific parameters. In contrast, 
the RH model is parameterized in terms of relative-type specific 
IBD probabilities, so it can accommodate only one ARP type at a 
time. The other advantage of this CL model is that it can allow for 
incorporation of covariate effects by re-parameterizing the model 
in terms of the logarithms of genetic relative risk parameters. A 
modification of this original two-parameter CL model into a one- 
parameter model was proposed by Goddard et al. (2001). Linkage 
analysis using the CL model has been proven to be an effective 
tool for evaluating genetic linkage (Goddard et al., 2001; Arcos- 
Burgos et al, 2004; Reck et al., 2005; Doan et al, 2006; Rybicki 
et al, 2007; Stein et al, 2007; Zandi et al, 2007; Song et al, 2011). 

One limitation of the general two-parameter CL model is the 
unknown asymptotic distribution of certain cases when single- 
locus genetic constraints are imposed on the model parameters, 
even in the case of analyzing only independent ASPs. Because of 
the genetic constraints (Holmans, 1993), the distribution of the 
CL-LR (i.e., 2ln(10) * LOD score) statistics for linkage are mix- 
tures of x 2 distributions that are difficult to specify. The use of 
simulation methods to obtain p-values has been recommended 
to ensure accuracy of the inference in complex situations (Olson, 
1999). Although gene-dropping techniques can be used for this 
purpose, the ideal method to infer the statistical significance of a 
test statistic is to compare it with its permutation distribution. 
When analyzing affected pairs alone, however, permuting the 
allele sharing of relative pairs does not lead to a useful permuta- 
tion distribution. As an alternative, Sinha et al. (2006) developed 
regression prediction models that provide more accurate p-values 
under the CL model framework. However, their results are limited 
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to the cases they evaluated, so it is not a general solution for the 
unknown distribution of the CL-LR statistic. 

Here, we first derive approximations to the asymptotic dis- 
tributions of the CL-LR statistics when using the constrained 
two-parameter analysis model for independent ASPs. The deriva- 
tion is done under the null hypothesis of no linkage and assuming 
complete marker information, by following Self and Liang (1987), 
as done for the RH model (Holmans, 1993; Whittemore and 
Tu, 1998; Feng et al., 2006). Next, we study the empirical null 
distributions of the CL-LR statistics by simulation, again for inde- 
pendent ASPs, examining several analysis models with different 
constraints on the model parameters when using the LODPAL 
program in the S.A.G.E. package (2012). Then, we compare these 
distributions to the derived asymptotic distributions - either 
known or approximated in the previous step. 

MATERIALS AND METHODS 
CONDITIONAL-LOGISTIC MODEL 

We first briefly describe the original two-parameter CL model 
from Olson (1999). The unconditional (prior) probability that a 
pair of type r relatives shares i alleles IBD is denoted as/ n , and the 
estimated probability that the pair shares i alleles IBD conditional 
on the available marker data I m is denoted as / n -. Then the likeli- 
hoods under the null hypothesis (Ho) of no linkage and under the 
alternative (Hi ) can be written as 



H 0 :I(Xi = 1,X 2 = 1) =P(I m \r) 

and 

_E U, 

Hi :L(Xi,X 2 ) = P(.r m |r) '-°; 1,2 , 

(=0, 1, 2 

where Xj is the relative risk to an individual who shares i alleles 
IBD (i =0, 1, 2) with an affected relative: equating with the nota- 
tion used in the RH modeLXo = X l( (= 1) is the relative risk for 
unrelated individuals, Xi = X 0 is the offspring relative risk, and 
X 2 = X m is the MZ-twin relative risk. The CL model is param- 
eterized in terms of the logarithms of relative risk, so X; = e"\ 
Under the null hypothesis of no linkage, the parameters ( p" 1 , p" 2 ) = 
(0, 0) correspond to Risch's allele sharing probability parameters 
(zi, z 2 ) = (Vi, 14), where Z\ and Z2 are the respective probabilities 
an ASP shares 1 and 2 alleles IBD at a locus. The LR contribu- 

V- "Kit- 

tion for an ARP of type r is LR = ^' = 0 1 2 . ," , and for a sample 

Z^i = 0 1 2 k iJri 

of independent ARPs the LOD score is obtained by summing the 
base- 10 logarithms of the pair-specific LRs. For the test of linkage, 
this LOD score is maximized over a possible range of the param- 
eter space that depends on the constraints imposed, as discussed 
in the following section. For details of the derivation of the LR 
and the equivalence of the LR whether the parameterization is in 
terms of allele sharing probabilities or allele sharing relative risks, 
we direct the reader to Olson (1999). 

When the parameters Pi and P2 are completely free without 
any constraints, the parameter space is the whole 2-dimensional 



plane with two coordinate axes defined by the two parameters. 
The values of the two parameters under the null hypothesis fall 
into interior points of this parameter space, and so the CL-LR 
statistic under the null hypothesis of no linkage is distributed as 
xj asymptotically. We refer to this model as the unconstrained 
two-parameter model. 

When the (pure single-locus etiology) genetic constraints 
(Holmans, 1993) are imposed, the parameter Pi and P2 are con- 
strained to be Pi > 0 and P2 > log e (26^ — l), or equivalently, 
Xi > 1 and X 2 > 2Xi — 1, to reflect the possible allele sharing 
probabilities for ASPs. In this case, the values of the parameters 
under the null hypothesis are on the edge of the parameter space, 
so that the LR statistic is asymptotically distributed as the mixture 
(| — c) Xq + 5X.1 + C X.2 with the mixing proportion c represent- 
ing the probability that the allele sharing estimates fall inside a 
triangle that is part of the two-dimensional plane. We refer to this 
model as the constrained two-parameter model. 

MIXING PROPORTION c 

The mixing proportion c is a function of the expected informa- 
tion matrix. For the RH model with allele sharing parameters, it 
has been derived to be c ~ 0.098 when there is complete marker 
information (Holmans, 1993; Whittemore and Tu, 1998; Feng 
et al., 2006), regardless of the choice of any two free parame- 
ters, i.e., (zo, zi), (zo, z 2 ), or (zi, Z2). However, for the CL model 
with the parameters in terms of the logarithms of relative risk, this 
value is unknown. We apply the method of Self and Liang (1987), 
as for the RH model, to derive the mixing proportion c for the LR 
statistic in the CL genetic constrained, two-parameter model. 

As shown in Figure 1, let (Pi ,^2 ) represent a point in the 
2-dimensional plane with two coordinate axes that are defined 
by the parameters Pi and P2, constrained to be Pi > 0, P2 > 
log e (le^ 1 — l) (gray area). We first define the three vertices of 
possible triangles in the (Pi,P2) plane. Let N= (0, 0) be the 
null point, A denote an additive inheritance point, and D a 
dominant inheritance point. The point A will be on the line 
p 2 = log e (2e Pl - l). We define D =(0, p 2 ) as a point on the 
P2 axis where the value of P2 is the same as the point A, as in 
Figure 1. Let I be the Fisher information matrix of the likelihood 

function L^data\f>\, $2j evaluated at the null values. Assuming 
complete information, the variance -co variance matrix of the 

/64\ T 
parameters is the inverse of I, i.e., J I g J" ^ et P^P be 

the spectral decomposition of I -1 , and Yn, Ya, and Yd be the 
orthogonally transformed vertices of N, A and D such that Y = 
A l/2pT (fi-Nj . Let )>n, yA, and /d be the rotated vertices of Yn, 
Ya and Yd such that Ya lies on the Pi axis and the ray defined by 
two points Yn and Yd becomes the hypotenuse in the upper right 
quadrant of the plane. Now, the three rotated vertices yN,yA> and 
yD define the triangle area in the orthogonal space, and the angle 
9 formed by the two rays y^yk and y^yo represents the mixing 
proportion c. Letting the end point of the hypotenuse be (x, y), 
8 = arctan (^) and c = S= . 

If a model with no dominance genetic variance is to fit, then 
P2 = log c (le$ l — l), as shown by a solid red line in Figure 1. 
Owing to the fact that this line is not straight, the angle 8 differs 
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FIGURE 1 | The three points (A1, A2, and A3) used to approximate the 
relation between |3, and |S 2 and the upper bound of under genetic 
constraints in the CL model. The corresponding dominant points are 
denoted (D1, D2, and D3), and the shaded area is the possible triangle area 
in the CL model. 



according to the choice of the point A on the line. The point 
A depends on both the assumption we make about the relation 
between Pi and P2, and the upper value of Pi that is chosen. 
We consider 3 different points for A, denoted Al, A2, and A3, 
as shown in Figure 1. First, under the Al assumption, we take 
the exact relation between Pi and P2, i.e., P2 = log f (2ef !l — l), 
and approximate the angle 8 under the assumption that Pi repre- 
sents the allele sharing probability z\ , which has maximum value 
Vi. Second, with the A2 assumption, we approximate a straight 
line about the null value using a Taylor series expansion, i.e., 
P2 = 2Pi (dotted red line in Figure 1). In this case, the upper 
bound of Pi is irrelevant. This is equivalent to using the trian- 
gle obtained from the constraints on X, i.e., X2 = 2Xi — 1. Third, 
with the A3 assumption, we take the exact relation between Pi 
and P2 and approximate the angle 6 under the assumption that 
Pi can go up to 1. This is equivalent to assuming the maximum 
offspring relative risk "k\ = Xn ~ 2.718. We derive the resulting 
mixing proportions for these 3 cases and expand them for more 
values in the results section. 

ONE-PARAMETER MODEL 

Goddard et al. (2001) proposed to modify the two-parameter 
model into a one-parameter model on the basis of the min- 
max model developed by Whittemore and Tu (1998). In 
this one-parameter model, the constraint X2 = (it + l)Xi — jt 
was imposed, where jt is a parameter associated with the 
mode of inheritance and is fixed to be 2.634, i.e., P2 = 
log e (3.634e Pl - 2.634) (Olson, 2002). This constraint assumes 
a genetic model approximately halfway between a recessive and 
a dominant mode of inheritance, which has been shown to be 
usually more powerful for most genetic models. 



For this one-parameter model, the CL-LR statistic is known 
to be asymptotically distributed as a x 2 when Pi is free with- 
out any constraints, because its null value is an interior point of 
the parameter line. Even though Whittemore and Tu's minmax 
constraint is already imposed to make it a one-parameter model, 
we refer to this model as the unconstrained one-parameter model 
because Pi is completely free without any genetic constraints. 
When the parameter space for Pi is constrained by Pi > 0 (equiv- 
alentlyXi > 1) to reflect non-negative allele sharing probabilities, 
the CL-LR statistic is asymptotically distributed as a 50:50 mix- 
ture of a point mass at 0 and x 2 . We refer to this as the constrained 
one-parameter model. 

C0VARIATES 

If there are K covariates in the model, assuming a log-linear 
(i.e., multiplicative) effect of the covariate on genetic relative risk, 
which is a common, natural, and flexible way to model relative 
risk in general epidemiology (Olson, 1999), the relative risk is 

X; = exp^p, + YljLi ^ x j)' where the 8y are the two parameters 
associated with the covariate Xj, with Po = 8nj = 0. Therefore, 
each covariate added requires two additional parameters for the 
two-parameter model but only one additional parameter for the 
one-parameter model. 

When there are no constraints imposed on the covariate 
parameters, with the addition of K covariates the CL-LR statistic 
is asymptotically distributed as xjnt+i) m me unconstrained two- 
parameter model. For the triangle-constrained two-parameter 
model, with the addition of K covariates the distribution of the 
CL-LR statistic is a mixture of a point mass at 0 and several x 2 s 
with up to 2(K + 1) df, asymptotically. However, no covariates 
are allowed in the two-parameter model in the LODPAL program 
in the S.A.G.E. package (2012), owing to the practical difficulty of 
maximizing the likelihood of models with two additional parame- 
ters for each covariate. Therefore, in this study we did not consider 
the two-parameter models with covariates. 

For the one-parameter model, addition of covariates requires 
one additional parameter for each covariate. With the addition 
of K covariates, without any additional constraints imposed on 
covariate parameters the CL-LR statistic is asymptotically dis- 
tributed as Xjfc+i m the unconstrained one-parameter model. 
Addition of K covariates in the constrained one-parameter 
model, again without any additional constraints imposed on the 
covariate parameters, gives a CL-LR statistic with a distribution 
that is asymptotically a 50:50 mixture of a x 2 with K df and a 
X 2 with K + 1 df, (Goddard et al, 2001). In this study, we only 
included the constrained one-parameter model with covariate(s), 
and this is referred to as the covariate model. 

Depending on additional constraints on the covariates, we 
define two covariate models. By including a "mean-centered" 
covariate (x — x), no constraints on the are required (Olson, 
1999), so the CL-LR statistic is asymptotically distributed as a 
50:50 mixture of two x 2 s depending on the number of such 
covariates, as stated previously. This is reasonable for many 
covariates, in particular continuous covariates such as age. We 
refer to this as the unconstrained covariate model. 

However, for some covariates, such as indicator variables that 
represent different populations or a binary factor, the offset from 
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the minimum value of the covariate, i.e., "minimum-adjusted," 
[x a = x — min(x)] is included in the model, so that the smallest 
value of the covariate equals zero. For such covariates, the con- 
straint min Ylj x afiij — ~ Pi i s applied; it is not then feasible to 

derive the asymptotic distribution of the CL-LR statistic under 
the null hypothesis theoretically, since it depends on the distribu- 
tion of the covariate values in the given data. We refer to this as 
the constrained covariate model. 



RESULTS 

ASYMPTOTIC NULL DISTRIBUTIONS UNDER TRIANGLE CONSTRAINTS 

The resulting triangles under assumption Al are graphically 
illustrated in Figure 2, showing the steps to derive the mixing 
proportion for a given value of A. In this figure, the possi- 
ble triangle space for ASPs on the original (pi, P2) plane is 
in black, formed by the three vertices (N, A, D) = {[0, 0], 
[V2, log e (2e 1 / 2 - 1)], [0, log £ ,(2e 1 / 2 - 1)]}. Then, we have 



SIMULATIONS 

To examine the precision of the expected asymptotic distribu- 
tions in the previous section, we used simulation to determine 
the empirical null distributions of the CL-LR statistics. We con- 
sidered 6 different analysis models described in the previous 
section. We considered the covariate model with just one covari- 
ate. For the unconstrained covariate model, we included one 
with a mean-centered continuous covariate. For the constrained 
covariate, we included one with a minimum-adjusted binary 
covariate. 

We first simulated 100,000 replicates of 500 nuclear families 
having two parents and two affected siblings, i.e., 500 indepen- 
dent ASPs. For each case, one fully informative unlinked marker 
was simulated by assigning a unique allele to each founder, and 
then the alleles were randomly segregated to all offspring. For 
covariate models, under the null hypothesis of no linkage and 
no covariate effect, the covariate was simulated such that it was 
correlated with affection status but not with genotype. A random 
continuous value from a normal distribution with mean 0 and 
variance 1 was first assigned to each individual, regardless of affec- 
tion status. Then a continuous covariate was simulated by adding 
a pre-fixed covariate effect to this value. A binary covariate was 
generated by dichotomizing this continuous covariate such that 
its population prevalence was 0.2. Given the covariate values for 
each member of the pair, the pair-level covariate for a pair was 
created by summing the two individual-level covariates. The con- 
tinuous pair- wise covariate values for the unconstrained covariate 
model are mean-centered, and the binary pair-wise values for 
the constrained covariate model are minimum-constrained when 
they are included in the analysis. 

To check the performance of the expected asymptotic null dis- 
tribution for each analysis model under different sample sizes, 
we also simulated 100,000 replicates of 30, 50, and 100 fami- 
lies, as above. Additionally, the precision of the approximated 
asymptotic null distributions of the CL-LR statistics for the con- 
strained two-parameter model was compared with the empirical 
null distributions under different marker information levels. We 
simulated 100,000 replicates of 100 independent ASPs for markers 
with 2, 4, 8, and 20 equally frequent alleles. These numbers corre- 
spond to PIC values of 0.38, 0.70, 0.86, and 0.95, respectively. We 
checked two cases, when both parents are typed and when neither 
is typed. 

The empirical p-value corresponding to the LOD score was 
determined by assigning p = (r + 1)/(100, 000 + 1) to the rth of 
the ranked LOD scores from 100,000 replicates. The asymptotic 
p-value corresponding to the same LOD score was calculated 
using the known or approximated asymptotic distribution, as 
described above. 
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0)> n= \ 0 J' andj,C= U.73lJ- 
The corresponding orthogonally transformed triangle (Ym, Ya, 
Yd) is in blue, and the green dashed triangle (yN >yA>yo) is the 
same orthogonally transformed triangle after rotation such that 
Ya lies on the Pi axis and the ray defined by ijv and Yd becomes 
the hypotenuse in the upper right quadrant of the plane. Then 
the angle 9 formed by the two rays y^y\ and y^yo in the green 
triangle is arctan (jr^fg) ~ 0.316, and the corresponding mixing 
proportion c\ is ^= f& 0.050. By following the same steps, we 
find the mixing proportions to be cj ~ 0.044 and C3 ~ 0.054, 
respectively, under the A2 and A3 assumptions. 




FIGURE 2 I The distribution of constrained CL-LR statistics under the 
A1 approximation. The black area (N, A, and D) is the original possible 
triangle space for ASPs, the blue area {Ym, Ya, and Yd) is the orthogonally 
transformed triangle, and the green dashed triangle {y N , y A , and Yd) is the 
space after rotation. The angle 9 formed by the two rays YnYa and YnYd 
represents the mixing probability c. 
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The value of C2 obtained from the A2 assumption provides the 
minimum bound for c and, from the Al and A3 assumptions, we 
can see that the mixing proportion value c becomes larger as we 
take a larger upper value for Pi. Figure 3 shows how the value of 
c depends on the value of the parameter pi. It can be seen that 
the maximum value converges to around 0.070, which is smaller 
than the value for the RH model. The critical LOD score values 
corresponding to the test sizes 0.05, 0.01, 0.001, 0.0001 [the clas- 
sical "LOD score 3" criterion given by Morton (1955)], 0.000049 
[significant evidence for linkage given by Lander and Kruglyak 
(1995)] and 0.00001 are given in Table 1 for the different mixing 
proportion values. Given the same size of test, the critical LOD 
scores for the CL model are smaller than those for the RH model. 
Therefore, the null hypothesis is more likely to be rejected using 
the CL-LR test, and the CL-LR statistic is more powerful. 

EMPIRICAL NULL DISTRIBUTIONS 

Two-parameter model 

In Figure 4, we show plots of -login (empirical p-value) against 
-logio (asymptotic p-value) corresponding to the observed CL- 
LR statistics with a sample size 500 for two two-parameter 
models. For the unconstrained model, the empirical p-values well 
matched the asymptotic p-values from the expected chi-square 
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FIGURE 3 | The range of the mixing proportion values according to the 


different betal values for the distribution of the CL-LR statistics from 


the constrained two-parameter model. 



Table 1 | Critical LOD scores obtained from the constrained 
two-parameter models for different mixing proportion values; 
CL — c m i n and CL — c max are the minimum and maximum c values for 
the CL model, A 1-c is the value from the A1 approximation, and RH-c 
is the mixing proportion for the RH model. 



Mixing Size of test 

proportion 





0.05 


0.01 


0.001 


0.0001 


0.000049 


0.00001 




0.662 


1.276 


2.202 


3.154 


3.452 


4.118 


A1-c 


0.672 


1.289 


2.219 


3.172 


3.470 


4.138 


CLCmax 


0.702 


1.328 


2.265 


3.225 


3.524 


4.195 


RH-c 


0.742 


1.377 


2.324 


3.290 


3.591 


4.265 



distribution with 2 df. For the constrained model, the mixture 
distribution from the Al assumption was also close to the empir- 
ical distribution. Since the mixing proportions from the three 
approximations are so close to each other, the empirical distri- 
butions matched the asymptotic distributions well for all three 
different mixing proportions (results not shown). 

For each sample size simulated, the specific LOD score val- 
ues corresponding to the empirical p-values 0.05, 0.01, 0.001, and 
0.0001 for these two models are given in Figure 5, compared with 
the theoretical values (shown as a red line for each p-value) . These 
values are the critical values for the type I error rates equal to the 
given empirical p-values. Overall, for all sample sizes, the criti- 
cal LOD scores from the empirical distributions were similar and 
very close to the values from the asymptotic distributions, well up 
to about -login (p-value) = 3. When the type I error rate is 0.0001, 
the critical LOD scores varied depending on the sample size. 

The empirical null distributions under different marker infor- 
mation levels for the constrained two-parameter model are shown 
in Figure 6 (A for parents typed, B for parents not typed). For 
the two types of parental information, the specific LOD score val- 
ues corresponding to the empirical p-values 0.05, 0.01, 0.001, and 
0.0001 are again compared with the theoretical values from the 
Al assumption (shown as a red line for each p-value). Again, it 
can be seen that the approximated asymptotic null distribution 
well matched the empirical distribution for the different levels of 
marker information, both in terms of the number of alleles and 
the amount of parental information. 

One-parameter model 

Here again, we found that the distribution of LOD scores follows 
the theoretical distribution well (results not shown). For both 
one-parameter models, the empirical p-values well matched the 
asymptotic p-values from the expected chi-square distributions. 
For the unconstrained case, the CL-LR statistic was distributed as 
a Xi> as expected. The empirical distribution of the CL-LR statis- 
tics for the constrained model followed closely a 50:50 mixture of 
a point mass at 0 and a x.p which again agrees with the asymptotic 
distribution. For all sample sizes, the critical LOD scores from 
the empirical distributions were again similar and very close to 
the values from the asymptotic distributions well, up to about - 
logio (p-value) = 3, and they varied depending on the sample 
size when the type I error rate is 0.0001, as for the two-parameter 
model. 

Covariate model 

In Figure 7, we show the distributions of empirical p-values under 
the null hypothesis of no linkage for the unconstrained covari- 
ate model. The empirical p-values for the covariate model with 
one unconstrained continuous covariate matched well the asymp- 
totic p-values from a 50:50 mixture of a xf and a xj distribution 
when the sample size was 500, as expected. However, unlike 
other analysis models, the distribution of LOD scores did not 
follow the theoretical distribution for the smaller sample sizes. 
We found the empirical null distribution departed more from 
the asymptotic null distribution the smaller the sample size, as 
expected. For example, the critical LOD scores were over 10.0 
for sample sizes 30, 50, and 100, compared to 3.77 from the 
asymptotic distribution for the test size 0.0001. 
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FIGURE 4 | Null distributions of the CL-LR statistics for the 
two-parameter models, using 500 independent ASPs and a fully 
informative marker. The empirical p-values for the observed LR statistics 
(y-axis) are plotted against the asymptotic p-values from known chi-square 


distribution (x-axis) for the unconstrained model (A) and for the constrained 
model (B) Note that the asymptotic distribution for the constrained model is 
under the A1 assumption, and a 95% confidence interval is shown by the 
dotted red line. 
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FIGURE 5 | The LOD score values corresponding to the empirical by sample size and size of the test. These values are the critical values for 

p-values 0.05, 0.01, 0.001, and 0.0001 for the unconstrained the type I error rates equal to the given empirical p-values. The theoretical 

two-parameter model (A) and the constrained two-parameter model (B), values are shown as a red line for each p-value. 
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FIGURE 6 | LOD score values corresponding to the empirical p-values and not typed (B). These values are the critical values for the type I error 

0.05, 0.01, 0.001, and 0.0001 under different marker information levels for rates equal to the given empirical p-values. The theoretical values are shown 
the constrained two-parameter model, when the parents are typed (A) as a red line for each p-value. 
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FIGURE 7 | Null distributions of the CL-LR statistics for the 
unconstrained covariate models, using 30, 50, 100, and 500 independent 
ASPs and a fully informative marker. The empirical p-values for the 



observed LR statistics (y-axis) are plotted against the asymptotic p-values 
from the known chi-square distribution (x-axis) for the unconstrained 
covariate model. The dotted red line is the 95% confidence interval. 
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FIGURE 8 | Null distributions of the CL-LR statistics for the constrained plotted against the asymptotic p-values from a 50:50 mixture of a and a xf 
covariate model, using 500 independent ASPs and a fully informative distribution (A), and from a 50:50 mixture of a point mass at 0 and a (B) 

marker. The empirical p-values for the observed LR statistics (y-axis) are The dotted red line is the 95% confidence interval. 
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FIGURE 9 | LOD scores corresponding to the empirical p-values 
0.05, 0.01, 0.001, and 0.0001 for the constrained covariate model 
by sample size and size of test. These values are the critical 
values for the type I error rates equal to the given empirical 



p-values. The theoretical values are shown as a red line for each 

p-value. The dotted lines are from a 50:50 mixture of a x? ar, d a 

distribution and the solid lines are from a 50:50 mixture of a 
point mass at 0 and a Xv 



For the constrained covariate model with a minimum- 
adjusted binary covariate, we show the empirical null distribution 
compared with two asymptotic distributions in Figure 8, one 
with a 50:50 mixture of a Xi an d a /J distribution (A) and the 
other with a 50:50 mixture of a point mass at 0 and Xi distribu- 
tion (B). The asymptotic p-values from a 50:50 mixture of a Xi 
and a /J distribution were too conservative, while the asymptotic 
p-values from a point mass at 0 and x \ distribution well matched 
the empirical p-values. In the simulated data for this model, the 
possible pair-wise covariate values are 0, 1, or 2, since we included 
the sum of two individual binary covariate values. Since Pi > 0 



and min ^^Xaj^k > — Pi> Si > 0 when Pi = 0. When Pi > 0, 

x a j > 0 

the minimum value of Si is = P-. Therefore, the two-parameter 
space is constrained to be 1/3 of the whole plane, instead of 1/2 
of the plane, which causes the asymptotic p-values from a 50:50 
mixture of a Xi and a X2 distribution to be too conservative. In 
practice, the distribution will depend on the distribution of the 
covariate values in the data. 

In Figure 9, the specific LOD score values corresponding to the 
empirical p-values 0.05, 0.01, 0.001, and 0.0001 are given for each 
sample size simulated. These values are again the critical values 
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for the type I error rates equal to the given empirical p-values, 
compared with theoretical values (shown as a red line for each 
p-value). The dotted lines are from a 50:50 mixture of a xf and a 
X 2 distribution, and the solid lines are from a 50:50 mixture of a 
point mass at 0 and a x 2 - 

DISCUSSION 

In the RH model, the mixing probability c (which represents the 
probability that the allele sharing estimates fall inside the possi- 
ble triangle) is the same for any two allele-sharing parameters. 
However, this is not so in the CL model owing to the non-straight 
line relation between the two parameters Pi and P2, the loga- 
rithms of relative risks. In this paper, we developed three approx- 
imations to the asymptotic distributions of the CL-LR statistics 
for the constrained two-parameter model, under the null hypoth- 
esis of no linkage, for independent ASPs. We derived the mixing 
probability c assuming complete information, as was done for the 
RH model with Risch's allele sharing parameters, following the 
method given by Self and Liang (1987). From these three approx- 
imations, we also investigated the relation between the parameter 
values for Pi and c. We found the range of the c values to be 
(0.0439-0.070), which is lower than the value obtained for the RH 
model. This results in critical LOD score values lower by 5-11% 
(0.702-0.662 vs. 0.742) for a test size 0.05, and by 3-5% (2.265- 
2.202 vs. 2.324) for a test size 0.001, compared to the RH model. 
Therefore, the test using the CL-LR statistic will be more power- 
ful, though perhaps not significantly so. In practice, the estimate 
of Pi can be used to decide on an appropriate value for c to obtain 
a reasonably accurate test of linkage for a particular set of data. 

By simulation, the performance of the approximate asymp- 
totic distribution was checked for various sample sizes both when 
there is perfect information and under different marker informa- 
tion levels. This was done for two different parental information 
cases (typed and not typed) for a fixed sample size of 100 inde- 
pendent ASPs. Generally, for all sample sizes and the different 
levels of information content investigated, we found the empiri- 
cal null distribution of the CL-LR statistic from the constrained 
two-parameter model matches well the approximated asymptotic 
distribution. This result shows the applicability of the approximated 
asymptotic distribution to real data analysis for any marker. 

For the unconstrained two-parameter model, the unconstrained 
one-parameter model, and the constrained one-parameter model, 
we also found that the known asymptotic distributions matched 
the empirical distributions well. Therefore, for these models, the 
test of linkage using the CL-LR statistic can be performed using 



the known asymptotic null distribution to find the p-value. The 
unconstrained models may not be biologically plausible, but could 
be useful for the purpose of comparison, or when the data include 
ASPs with a different direction of genetic effect caused by other 
factors, as investigated by Dizier et al. (2000). 

Unlike for the other models, a large sample size was needed 
for the asymptotic distribution to hold well for the unconstrained 
covariate model, i.e., the constrained one-parameter model with 
an unconstrained covariate. Sinha et al. (2006) also reported this 
vast discrepancy between the asymptotic p-values and the empir- 
ical p-values for this model. Their result was based on average 
sample sizes of 20, 40, 80, 120, and 320 affected pairs. To deter- 
mine the sample size necessary for the asymptotic p-values to 
be applicable, we additionally simulated 200 and 300 ASPs. This 
showed that with 200 ASPs the empirical distribution matched 
well the asymptotic distribution (results not shown). Therefore, 
in practice, for this model we recommend the use of simulation 
methods or the Sinha et al. method when the sample size is less 
than 200, to ensure accurate p-values. 

Though the results are not shown, from additional simula- 
tions with two and three covariates and 500 ASPs, except in the 
tail, the distributions of CL-LR statistics for the unconstrained 
covariate model with two covariates also closely matched a 50:50 
mixture of a X2 a nd a x!> an d that for three covariates a 50:50 
mixture of a X3 and a X4» as expected from the asymptotic distri- 
butions. These results confirm that the empirical distribution of 
the CL-LR statistic for comparing nested unconstrained covariate 
models that differ by / covariates has ax 2 distribution with / df, 
as expected from the asymptotic distribution. Therefore, in large 
samples it is valid to test the significance of the contribution of a 
covariate using the asymptotic distribution. 

It was interesting to find in our simulated data that the empir- 
ical null distribution for the constrained covariate model, i.e., 
constrained one-parameter model with a constrained covariate, 
was closer to a 50:50 mixture of a point mass at 0 and x 2 dis- 
tribution than to a 50:50 mixture of a x 2 and a x 2 distribution. 
This is due to the functional dependency of 81 on the maxi- 
mum covariate value in the data when Pi > 0. This dependency 
effectively reduces the degrees of freedom and hence changes 
the distribution. To show how the range of the covariate val- 
ues in the data changes the null values of the parameters, and 
therefore the distribution of the CL-LR statistics, we additionally 
simulated datasets with pair-wise covariate values (0 or 1), (0, 
1, 2, or 3), (0, 1, 2, 3, or 4), and a random number in the 
range (0, 8). In Figure 10, we show a plot of the estimates of 
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the parameters Pi and Si, including the result from the (0, 
1, or 2) case in the previous simulation. We can see that the 
space for two parameters becomes smaller as the maximum value 
of the minimum-adjusted covariate increases. For the (0 or 1) 
case, it seems the CL-LR statistics will be closely distributed as 
the mixture CoXo + c iXi + C 2X.2- m other cases, a 50:50 mix- 
ture of a point mass at 0 and Xi distribution closely matched 
the empirical distribution. Therefore, in practice, the distribu- 
tion will depend on the distribution of the covariate values in 
the dataset, so careful examination of the distributions of the 
covariates in the dataset is needed before including them in any 
analysis. 

We did not include any power analysis in this study because 
our purpose was to find an approximation to the theoretically 
unknown null distributions and to compare them with the empir- 
ical null distribution, to provide guidelines for testing linkage 
when using the CL-LR statistics in various analysis models. To our 
knowledge, there has not been any study of the null distribution 
of LOD scores for the CL model, neither theoretical nor empiri- 
cal. The results from this study should provide useful guidelines 
for the linkage analysis of real datasets since our results are based 
on both a perfect scenario as well as on non-perfect cases. Our 
results for various sample sizes will also provide guidelines for 
cases with missing data, since these will in general correspond 
to a reduced sample size. We assumed no errors in the relation- 
ship between pairs. When the information content in the marker 
and/or pedigree structure in real data are reduced due to errors 
in the data, we would generally expect the power to be lower for 
given type I error; but the test of linkage based on our results 
will still be valid, as long as the analysis is done on independent 
pairs. 
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